On the use of Bernoulli mixture models for text classi$cation
نویسندگان
چکیده
Mixture modelling of class-conditional densities is a standard pattern recognition technique. Although most research on mixture models has concentrated on mixtures for continuous data, emerging pattern recognition applications demand extending research e/orts to other data types. This paper focuses on the application of mixtures of multivariate Bernoulli distributions to binary data. More concretely, a text classi$cation task aimed at improving language modelling for machine translation is considered. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
منابع مشابه
A Comparison of Event Models for Naive Bayes Text Classi cation
Recent approaches to text classi cation have used two di erent rst order probabilistic models for classi ca tion both of which make the naive Bayes assumption Some use a multi variate Bernoulli model that is a Bayesian Network with no dependencies between words and binary word features e g Larkey and Croft Koller and Sahami Others use a multinomial model that is a uni gram language model with i...
متن کاملText Classification from Labeled and Unlabeled Documents Using
This paper shows that the accuracy of learned text classi ers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classi cation problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from lab...
متن کاملMixture of Experts Classication Using a Hierarchical Mixture Model
A three-level hierarchical mixture model for classication is presented that models the following data generation process: (1) the data are generated by a nite number of sources (clusters), and (2) the generation mechanism of each source assumes the existence of individual internal class-labeled sources (subclusters of the external cluster). The model estimates the posterior probability of cla...
متن کاملICA Mixture Models for Unsupervised Classi cation ofNon - Gaussian Sources and Automatic ContextSwitching in Blind Signal Separation
An unsupervised classi cation algorithm is derived from an ICA mixture model assuming that the observed data can be categorized into several mutually exclusive data classes whose components are generated by linear mixtures of independent non-Gaussian sources. The algorithm nds the independent sources, the mixing matrix for each class and also computes the class membership probability for each d...
متن کاملLatent class models for classification
An overview is provided of recent developments in the use of latent class (LC) and other types of %nite mixture models for classi%cation purposes. Several extensions of existing models are presented. Two basic types of LC models for classi%cation are de%ned: supervised and unsupervised structures. Their most important special cases are presented and illustrated with an empirical example. c © 20...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002